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Abstract —We study secure source-coding with causal disclo¬ 
sure, under the Gaussian distrihution. The optimality of Gaus¬ 
sian auxiliary random variables is shown in various scenarios. 
We explicitly characterize the tradeoff between the rates of 
communication and secret key. This tradeoff is the result of a 
mutual information optimization under Markov constraints. As 
a corollary, we deduce a general formula for Wyner’s Common 
Information in the Ganssian setting. 

I. Introduction 

There is a growing body of work in secure source coding 
[1], [2], [3], [4], [5], [6], [7]. Most of the problem formulations 
consider distortion at the legitimate receiver and equivocation 
at the eavesdropper. An alternative approach was proposed by 
Yamamoto [8], [9] which replaced the eavesdropper’s equivo¬ 
cation with the distortion incurred by the eavesdropper’s best 
estimate of the information source. The motivation behind this 
formulation is a purely operational approach to the problem 
of secrecy. The choice of distortion function may depend on 
the context in which secrecy is desired. 

Recently, the problem posed by Yamamoto was solved [10] 
and considerably generalized [6], [7], [11]. The salient feature 
of the new approach is the causal disclosure of information 
to the eavesdropper. There are compelling arguments that 
this disclosure is necessary for a robust notion of secure 
communication [7]. This formulation of secrecy is natural 
when understood in a game-theoretic context. A repeated game 
is being played by the adversary versus the communication 
system. Distortion is replaced by a payoff function, while the 
information sequences equate to actions of the players. 

Remarkably, when the payoff is chosen to be the log-loss 
function [12], the above framework recovers results for (nor¬ 
malized) equivocation-based secrecy [13]. Under this choice of 
payoff, the adversary expresses her belief about the distribution 
of the current information symbol, given her knowledge of 
past symbols. Thus, the secure source coding framework of 
[7] generalizes traditional approaches to secrecy. We note that 
with this generality, new challenges arise in certain contexts 
involving uncoded side information at the receiver [14], [15], 
[16]. 

Now, we recall the main result of [7] derived under this 
framework: The optimal tradeoff between communication rate 
R, secret key rate Rq and average payoff If is given by the 
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Fig. 1. The causal disclosure framework for secure source coding [7]. 
Disclosures D = {Dx,Dy) are allowed, with arbitrary orthogonal 
disclosure channels PD^,Dy\XY = Pn^ixPoyiY- 


union of regions 

R>I{X-U,V), (1) 

Ra>I(Dy,,Dy-V\U), (2) 

If < minE[7r(X, Y, z(t/))], (3) 

2 (-) 

where the union is taken over distributions that enforce the 
Markov chain — X — (U^V) — Y — Dy. Though the 
presentation in [7] restricts itself to the case when all random 
variables have hnite alphabets, most of the results (lossless 
communication is an exception) easily generalize to continu¬ 
ous random variables. 

The scheme used to obtain the above region admits a 
simple interpretation. The information source X" is split into 
two parts: a secure part U" and a non-secure part ( 7 ". The 
eavesdropper is given full knowledge of ( 7 ", while the secret 
key is focused on keeping U" perfectly secure. This can be 
implemented using a superposition code [7]. 

Unlike V, the variable U plays a specihc and concrete role 
in the secure communication system as the information that 
is leaked to the eavesdropper, discussed in [7, Section VI-E]. 
That is, the signihcance of the distribution of U is more than 
simply that of an optimization parameter for the region (l)-(3). 

This paper asks the following question: Given that Px,y,u is 
Gaussian (this hxes the bound on If), can the communication- 
key tradeoff be realized with Gaussian 7V|x,u,c/? Otr primary 
motivation for this investigation is a potential application to 
the problem of secure rate-limited control. In the context of 













control, y” may be a Gaussian control signal that is correlated 
with the state process X”, and C/" is a Gaussian degradation 
of the control that is leaked to the eavesdropper. 

Classical control theory [17] provides exact characteriza¬ 
tions of control performance for Gauss-Markov processes. If 
the relevant rates are optimized by Gaussian distributions, 
then we can replace rate-limited feedback links with idealized 
Gaussian channels and use these characterizations to derive 
tight bounds on performance. This observation has already 
been used by Tatikonda-Mitter-Sahai [18], [19] to characterize 
optimal performance in rate-limited control with quadratic 
costs. 


It is worth pointing out that if the optimization is carried 
out jointly over {U, V) satisfying the Markov constraint, then 
Gaussian Pu,v\x,y does not suffice to achieve the entire 
rate-payoff region even when 7r(-, •, •) is a quadratic func¬ 
tion [20]. However, the Gaussianity of U is motivated by 
operational considerations derived from the coding scheme 
described above. Since [/" represents information that is 
revealed to the eavesdropper, this is conveniently modeled in 
many applications by a linear/additive channel from or 
yn J.Q jjn^ Such degradations can be often be realized by 
physical processes (e.g. optical, electrical). Further theoretical 
justification is provided by the worst-additive-noise-lemma 
[21], [22] when the payoff is pointwise mutual information 
T:{y,z) = - log ■ 

Besides the potential application to secure control, the 
problem we consider is interesting in its own right as a mutual 
information optimization under unusual Markov constraints. 
There has been much effort in the information theory com¬ 
munity focused on proving optimality of Gaussian random 
variables for various applications [21], [22], [23]. We remark 
that recent techniques [23] designed to prove the optimality 
of Gaussian auxiliaries seem to be best suited to cases where 


the optimization is over random variables at the extremes of 
Markov chains [24]. It is unclear if the method can be adapted 
to our setting, where the optimization is over an auxiliary in 
the middle of a Markov chain. 


Our approach will be a strengthening of the estimation- 
theoretic technique used to compute the common information 
of a bivariate Gaussian distribution in [25] (the result first 
appeared in [26], but the proof had a gap that was later 
corrected). As a corollary, we deduce a general formula for 
Wyner’s common information in the Gaussian setting. This 
quantity has proved to be fundamental in various source coding 
problems [25], [27], [28], [29], [30], although most of these 
results consider sources with a finite alphabet. 


II. Notation 

We represent both random variables and probability distri¬ 
bution functions with capital letters, but only letters P and 
Q are used for the latter. The set of real numbers is denoted 
by K., while K+ denotes non-negative reals. We denote the 
conditional distribution of the random variable Y given the 
random variable X by PY\x{y\x)- This is the usual notation, 
although sometimes we do abbreviate it as Py\x- Markov 


chains are denoted by AT — F — Z implying the factorization 
Px,Y,z = Px,yPz\y while X YY indicates that the random 
variables X and Y are independent. Sequences of random 
variables Xi, ..., Xn are denoted by X". 

Let diag({ai}[_]^) € denote a diagonal matrix A 

with diagonal entries An = Ui. The transpose of a matrix 
A is denoted by and A~'^ = {A~^Y'. If is a 

d-dimensional (column) vector, then X-i denotes the vector 
formed by the first i components of X. 

We denote the covariance matrix of zero-mean random 
vectors X € hy Yx := E[XX'^] € When d = 1, 

we set ax ■= Sx- For zero-mean random vectors F G 
the cross-covariance matrix is denoted by Yxy '■= E[A'F^] G 
Note that Exy = ^yx- 

Recall that Ex is real, symmetric and positive semi-definite. 
Its eigen-decomposition is given by Ex = BxAB^. Let 
rx '■= rank(Ex). We set E^^ = i?xA^/^i?x ~ 

with 

!!)• 

where A+ G is the submatrix of A with strictly 

positive diagonal entries. Note that we have 

E-^/^ExE-^/^ = o), (5) 


where G R’’^’’ is the identity matrix. 

III. Main Result 

In the following, we assume that X,Y,U G R'^ are jointly 
Gaussian random vectors. There is no loss of generality in 
assuming the same length for all vectors since we can zero- 
pad shorter vectors. For simplicity, we restrict to disclosures 
G {%,X} and Dy G {0,F}. 

Theorem 1. For jointly Gaussian X,Y,U G R'^, the region (1)- 
(3) is optimized by Gaussian PvixuY, such that X — {U, V) — 
Y holds. In particular, we have the following communication- 
key tradeoffs: 

• Arbitrary Pd^\x und Dy = 0: 

R>I{X-U,Y), (6) 

Rq>I{D,]Y\U), (7) 

. D, = ^,Dy = Y: 
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D,=X,Dy = Y: 
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with 


axA = 


^PxY\U,i + PXY\U,t^^’^P]cY\U,i + + 1 ) 


2(A+1) 


(13) 

where are singular values of 

{pxY\u,iYi=i singular values of Ys'^Y'^xy\u^Y\u 
the tradeoffs are parametrized by A € [0, oo). 


A. Interpretation 

The parameter A specifies the (i?, i?o) point that is tangent 
to a supporting line with slope —A“^. Equations (10) and (13) 
precisely capture the path traced by the optimizing channels 
Px\u,v and Py\u,v as the line is varied. Note that the first 
case (Dy = 0) immediately follows from the data-processing 
inequality. 

The above results are expressed in terms of the singular 
values of the correlation matrix pxY '■= 

Recall that the linear MMSE estimator is given in terms of 
this matrix (for zero-mean random variables) as 

E[X|r] = E^^pxyE-^/V. (14) 

In the scalar case, this simplifies to 

¥.[X\Y]=pxy — Y, (15) 

cry 

where pxY is the correlation coefficient. 

We also generalize a result of [25], which considered the 
case of scalar Gaussian random variables. 

Corollary 1. Eor jointly Gaussian X,Y € Wyner’s 

common information is given by 

1 ^ 1 I 

C{X-Y) := min I{X, F; [/) = - V log 
^ ’ u-.x-u-Y ^ ^ 2^ 1-p* 

(16) 

where {pi}^^l are singular values of ' E^yEy ' . 

In the following sections, we only present the proof for 
{Dx,Dy) = {X,Y). The proof for the second case is similar 
and thus, omitted. 


IV. Proof 

Note that the communication rate I{X;U, V) is minimized 
by F = Y, while the secret key rate I{X,Y;V\U) is 
minimized by choosing V such that I{X,Y;V\U = u) = 
C{X; Y\U = u) for every U = u. In general, it is not possible 
to minimize both rates simultaneously. 


We consider the optimal frontier of rates by considering the 
point at which a supporting hyperplane touches the region. In 
other words, we would like to show that 

argmin {XI{X;U,V) + I{X,Y;V\U)) (17) 

Pvixuy-X-{U,V)-Y 

is minimized by a Gaussian distribution for A > 0. We shall 
constructively show that a minimizer exists, so the above 
expression is well-defined. In the following, we shall perform 
the analysis conditioned on U, so it suffices to consider the 
problem 

argmin {XI{X;V) + I{X,Y;V)). (18) 

Pv\x,y-X-V-Y 

Since Px,y is Gaussian, linear-Gaussian Pv\x,y ensures 
that the joint distribution is Gaussian as well. Note that the 
Gaussianity of V is not necessary, since mutual information 
is invariant under invertible transformations. 


A. Diagonalization 

Consider X = Y^^'^X and Y = Y^^'^Y. This is defined 
to mean that only the positive eigenvalues are inverted. The 
zero eigenvalues remain zero. However, this is still a matrix 

X with probability one. We have 


inverse i.e. X = 


Yx = 


Yy = 


^rx 

0 

Ity 
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Also, we have 
E 


■^XY 


— ^x ^XY^v 


0 

Od—rx 

0 

Od—ry 

Ax,y 
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where Ax,y G 
we have 
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By the singular value decomposition. 
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where A G is diagonal and Bx G 

jgrrxrr ^j-g orthogonal matrices. Then with 
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Y = 


we have E^^ = Yx, 
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where the non-zero diagonal entries of A are {piYi^i 
ipi > 0), the singular values of the correlation matrix 


]-^/"EjfyEy 


1/2 


Thus, we have constructed invertible linear transformations 
X ^ X and Y Y such that E, Ey and E^y are 
diagonal. Since mutual information is invariant to invertible 
transformations, it suffices to show that 


arg mm 

D - - . ~ 
V\XY- 

is Gaussian. 


(a/(X;F) + /(1,F;F)) (26) 



B. Achievability Proof 

Let r := min(rx, J'y)- Consider independent random vari¬ 
ables V,Zi,Z2 ~ N{Q,Ir)- Let A\ = diag({aA,i}[^i) 

Bx = diag({&A.i}[=i)- Consider (0 < ax,i,bx,i < 1 for all i) 

X:r = y^V + x/l - AxZi (27) 

% = x/b^V + x/I-BxZ2, (28) 


with x/a\,ib\^i = Pi, so that is realized. It suffices 

to generate the remaining {d — r) components of X and 
Y independent of V. Note that we have X — V — Y and 
(X, Y) ~ PxY under the above construction. 

Under this distribution, we have (since Xi — Vi — Yi \/i) 


XI{X; V) + I{XY- V) (29) 

= XI{X..r-, V) + I{X,rYr; U) (30) 

r 

= ^(A/(X,;U)+/(X,y);U)) (31) 

r 

= ^ ((A + l)I{Xp, V) + I{Yp,Vi) - I{Xp, L))) (32) 

2=1 


(2 (1 - ax,iV+^il - bx,.) 2 (1 - p2)) 

(33) 


V f i log_ ^ _ 




(34) 


Now, set 


ax,i = 


^Pi + PiV^'^Pl + 4(A -I-1) 

2(A+1) 


(35) 


to achieve (11)-(12). It is easy to check that pf < ax,. < 1, 
which respects the correlation constraint x/ax,.bx,. = Pi- Note 
that ao,i = Pi, which recovers the construction of [25]. Also, 
limA-i-oo cL\,i = Pi, which reflects the fact that Vi = V with 
probability 1 as A ^ cx). 


Consider any Pv\xy such that X — V — Y (so Xi — V — Yi 
holds for all i). Note that we don’t make any structural 
assumptions on V here. Let := ]E[(Xi — E[Xi|U])^] 

and Dy. ■= E[(y) — E[yi|U])^]. Using standard information- 
theoretic inequalities, we have 


XI{X-V) + I{XY-V) 

(40) 

> XI{X..r-,V) + I{X,rY,r;V) 

(41) 

= A ^ I{X.; ^ I{X.Y.; V\X^-^Y^- 

2 = 1 2 = 1 

-1) (42) 

r r 

= A ^ I{X.- U, ^ I{X.Y.- U, 

2 = 1 2 = 1 

■') (43) 

r 

>J2{^HXi;V) + IiX.Y.-,V)'^ 

(44) 

r 

= ^ ((A + l)I{Xi; V) + I{Y.- U) - I{X.-Y.)) 

2 = 1 

(45) 

r 

> ^ ((A + 1)I{X.;E[X.\V]) + I{Y.;E[Y.\V]) - 

■I{X.-Y. 

2 = 1 

(46) 

> ^ ((A -f- l)Rx^iDj,^) + Ry^Dy^ - I{Xi-Y. 
2 = 1 

)) (47) 


(48) 

where (46) follows from the data-processing inequality and 
(47) follows from the definition of the Gaussian rate-distortion 

function Rx{ ) [31, Theorem 10.3.2]. 


Since Xi — V — Yi holds for all i, using 


D^^=E[{X.-E[X.\V]f] 

(49) 

= E[12] _e[E[X,|U]2] 

(50) 

= 1-E[E[1,|U]2], 

(51) 


C. Converse Proof 

The following lemma shall be crucial in establishing the 
optimality of our construction. This is essentially the Cauchy- 
Schwarz inequality. In [25], the AM-GM inequality and the 
orthogonality principle in optimal estimation were used in the 
converse proof. This approach does not work here due to the 
asymmetry introduced by the parameter A. 

Lemma 1. For unit variance random variables U G K. and 
any Pv\x,y such that X — V — Y, we have 

= E[A:y]2 < E[E[X|U]2]E[E[y|U]2] (36) 

Proof: By X — V—Y and the Cauchy-Schwarz inequality, 
we have 

E[A:y]2 =Ey[E[2fr|U]]2 (37) 

= Ey[E[X|U]E[y|U]]2 (38) 

<E[E[X\Vf]E[E[Y\V]^]. (39) 


and similarly 

Dy^ = l-E[E[Y.\V]% (52) 

we have from Lemma 1 that (recall that E[A'iTi] = pi) 


P^<{l-Dj,){l-Dy) 


pj + Dy^{l-Dj^J<{l-Dj^J 
- Dp. < 1 


1-D 


Xi 


Dy <1 - 

- I-D 


Xi 


(53) 

(54) 

(55) 

(56) 


Inserting (56) into (48), we find that minimizing (48) is 
equivalent to 


maximize 




=:fx{l-DyJ, 


(57) 


^-Dxi 



for 1 < i < r, where 


f\{x) 


{l-x)^+^{x- pj) 

X 


(58) 


and the maximization is carried out over functions Dj^ {X), 
where Dj^, : R+ —> [0,1]- Note that Dy, (A) > 0 and (56) 
imply that 0 < (A) < 1 - pf pf < (1 - Dj^, (A)) < 

1. In order to maximize the above expression, we set — 
= 0 to obtain 


^x.(A) = l- 


Vi + PiV^^Pj + 4(A + 1) 
2(A+1) 


(59) 


Since /(pf) = /(I) = 0, f{x) > 0 for p| < a; < 1 and 
/ is smooth, this critical point must be the maximum. Since 
the resulting value of was achieved by the Gaussian 

construction, we conclude that Gaussian auxiliaries suffice for 
achieving the optimal rate frontier. 
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